HappyDB is a corpus of 100,000 crowd-sourced happy moments via Amazon’s Mechanical Turk. You can read more about it on https://arxiv.org/abs/1801.07746.
Here, we explore this data set and try to answer the question, “What makes people happy?”
From the packages’ descriptions:
tidyverse is an opinionated collection of R packages
designed for data science. All packages share an underlying design
philosophy, grammar, and data structures;tidytext allows text mining using ‘dplyr’, ‘ggplot2’,
and other tidy tools;DT provides an R interface to the JavaScript library
DataTables;scales map data to aesthetics, and provide methods for
automatically determining breaks and labels for axes and legends;wordcloud2 provides an HTML5 interface to wordcloud for
data visualization;gridExtra contains miscellaneous functions for “grid”
graphics;ngram is for constructing n-grams (“tokenizing”), as
well as generating new text based on the n-gram structure of a given
text input (“babbling”);Shiny is an R package that makes it easy to build
interactive web apps straight from R;
library(tidyverse)
library(tidytext)
library(DT)
library(scales)
library(wordcloud2)
library(gridExtra)
library(ngram)
library(shiny)
We use the processed data for our analysis and combine it with the demographic information available.
hm_data <- read_csv("../output/processed_moments.csv")
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv'
demo_data <- read_csv(urlfile)
We select a subset of the data that satisfies specific row conditions.
hm_data <- hm_data %>%
inner_join(demo_data, by = "wid") %>%
select(wid,
original_hm,
gender,
marital,
parenthood,
reflection_period,
age,
country,
ground_truth_category,
text) %>%
mutate(count = sapply(hm_data$text, wordcount)) %>%
filter(gender %in% c("m", "f")) %>%
filter(marital %in% c("single", "married")) %>%
filter(parenthood %in% c("n", "y")) %>%
filter(reflection_period %in% c("24h", "3m")) %>%
mutate(reflection_period = fct_recode(reflection_period,
months_3 = "3m", hours_24 = "24h"))
datatable(hm_data)
Warning: It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.htmlWarning: It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html